library(tidyverse)
library(plotly) Testing merging data from flumes and rainfall
Introduction
This notebook demonstrates the different functions that have been developed to analyse and check the field data from La Corona, based on the original work from Prof Chip Cheshier and Prof Francois Birgand at North Carolina State University.
Scripts and functions have been developed as part of FPTA 358 funded by INIA Uy.
This document specifically focuses on the merging of different files originating from the rainfall and flume raw data. This will allow providing 6 minute, daily and monthly summaries.
Packages
To be able to run this notebook, the following R packages should be available
Loading the developed functions
The functions are stored in the scripts folder, the same folder as the code version of this file (the *.qmd file).
source("Functions_merge.R") Order of merging files
According to page 23 (Flumes data) in the “Field data analysis, La Corona NCSU” document, the following order should be used for adding files:
ISCO sampler (Nivel) V1 or V2 file with csv ending (frequency of download is lower as there is more storage)
V1 or V2 elevation (Stevens file with csv.csv ending)
S1 or S2 elevation (HOBO20)
E1/E2 elevation (this is the emergency spillway) HOBO20
Testing the functions
We can now start testing the functions.
Data_dir = "../All_Raw_Data/Test_T4T52023_folder/T042223"Read in two processed files (rainfall and flume data)
First list the names of the processed files
rain_files <- list.files(paste0(Data_dir,"/rain/processed"))
flume_files <- list.files(paste0(Data_dir,"/flumes/processed"))We have the following processed rain files
rain_files[1] "OTIP_EM042223.csv" "OTIP_R1042223.csv" "OTIP_R7042223.csv"
We have the following processed flume files
flume_files [1] "E1042223_processed.csv" "E2042223_processed.csv"
[3] "s1042223_processed.csv" "S2042223_processed.csv"
[5] "V1042223csv_processed.csv" "v2042223csv_processed.csv"
[7] "V3042223csv_processed.csv" "V3P042223_processed.csv"
[9] "V4042223_processed.csv" "V4042223csv_processed.csv"
[11] "v4p042223_processed.csv"
Now read in two of the files (change the file names manually):
rainfall
ISCO nivel (a .csv file)
But other files are possible too
Note that here we are using read_flume_output2() which is slightly different to read_flume_output() in Functions_Flumes.R
The file for ISCO nivel is only downloaded at some dates, so we need to check if this file exists, and otherwise create a “dummy” dataframe.
rain_file = "OTIP_r7042223.csv"
ISCO_nivel = "V1042223_processed.csv"
rain <- read_otip_file(name = rain_file,
path_to_file = paste0(Data_dir,"/rain/processed"))Rows: 182 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (1): Difference
dttm (1): Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
if(file.exists(paste0(Data_dir,"/Flumes/processed/",ISCO_nivel))) {
ISCOnivel <- read_flume_output2(name = ISCO_nivel,
path_to_file = paste0(Data_dir,
"/Flumes/processed"))
} else {
message("ISCO nivel file does not exist: create a dummy df")
#create a dummy file
ISCOnivel <- tribble(
~`Date and Time`, ~`Water Level, meters`,
rain$`Date and Time`[nrow(rain)], NA
)
}ISCO nivel file does not exist: create a dummy df
Merging and summarising
Test summarising the files by 6 min and day before merging. This is to align the timesteps between the different loggers. As "6 min" is the default, there is no need for a timestep argument.
Events are summed across summary times, while flume data would be taking the mean value of velocity, height and flow.
test_summary_rain <- sum_fun(rain)
test_summary_flume <- sum_fun(ISCOnivel)Show the results
test_summary_flume# A tibble: 1 × 2
`Date and Time` `Water Level, meters`
<dttm> <dbl>
1 2023-04-22 14:18:00 NaN
We will now merge these two files.
In this case the rainfall file should be df1, as this has the most dense observations. The function uses full_join to make sure also keep all the observations in df2 (even if there is no equivalent time in df1)
merged_df <- merge_data(test_summary_rain,test_summary_flume)Joining with `by = join_by(`Date and Time`)`
Show the merged data
head(merged_df)# A tibble: 6 × 3
`Date and Time` Event `Water Level, meters`
<dttm> <dbl> <dbl>
1 2023-03-28 12:54:00 0 NA
2 2023-03-28 13:00:00 0 NA
3 2023-03-28 13:06:00 0 NA
4 2023-03-28 13:12:00 0 NA
5 2023-03-28 13:18:00 0 NA
6 2023-03-28 13:24:00 0 NA
Summarising across days and months
We can now also test the summaries across days and months. In this case the timestep argument needs to be defined as "day" or "month".
day_summary_rain <- sum_fun(rain, timestep = "day")
day_summary_flume <- sum_fun(ISCOnivel, timestep = "day")
month_summary_rain <- sum_fun(rain, timestep = "month")
month_summary_flume <- sum_fun(ISCOnivel, timestep = "month")These can also be merged
day_merged_df <- merge_data(day_summary_rain,day_summary_flume)Joining with `by = join_by(`Date and Time`)`
month_merged_df <- merge_data(month_summary_rain,
month_summary_flume)Joining with `by = join_by(`Date and Time`)`
Graphs
With the merged data sets we can now plot the variables against each other.
Starting with the 6 min data
p <- merged_df %>%
#pivot_longer(2:3,values_to = "value", names_to="variable") %>%
ggplot(aes(`Date and Time`, Event*0.2)) +
geom_point(colour = "blue") +
#geom_bar(stat = "identity", fill = "blue") +
geom_line(aes(`Date and Time`, `Water Level, meters`), colour = "red") + theme_bw() +
scale_y_continuous(sec.axis =sec_axis( trans=~., name="Water Level in Meters")) Warning: The `trans` argument of `sec_axis()` is deprecated as of ggplot2 3.5.0.
ℹ Please use the `transform` argument instead.
pWarning: Removed 6015 rows containing missing values or values outside the scale range
(`geom_line()`).

ggplotly(p)p1 <- day_merged_df %>%
#pivot_longer(2:3,values_to = "value", names_to="variable") %>%
ggplot(aes(`Date and Time`, Event)) + geom_bar(stat = "identity", fill = "blue") +
geom_line(aes(`Date and Time`, `Water Level, meters`*50), colour = "red") + theme_bw() +
scale_y_continuous(
# Features of the first axis
name = "Event",
# Add a second axis and specify its features
sec.axis = sec_axis( trans=~./50, name="Water Level in Meters")
)
p1Warning: Removed 26 rows containing missing values or values outside the scale range
(`geom_line()`).

ggplotly(p1)p2 <- month_merged_df %>%
#pivot_longer(2:3,values_to = "value", names_to="variable") %>%
ggplot(aes(`Date and Time`, Event*0.2)) + geom_bar(stat = "identity", fill = "blue") +
geom_line(aes(`Date and Time`, `Water Level, meters`*1000), colour = "red") + theme_bw() +
scale_y_continuous(
# Features of the first axis
name = "Event",
# Add a second axis and specify its features
sec.axis = sec_axis( trans=~./1000, name="Water Level in Meters")
)
#p1
ggplotly(p2)Adding another file (Hobo Stevens)
The next test is whether we can add a further file to the data. Reading in an isco velocity file
file_Stevens = "v1042223csv_processed.csv"
flume <- read_flume_output2(name = file_Stevens,
path_to_file = paste0(Data_dir,"/Flumes/processed"))Date in ISO8601 format; converting timezone from UTC to "America/Argentina/Buenos_Aires".
Summarise this data to 6 min data
test_summary_stevens <- sum_fun(flume)
test_summary_stevens# A tibble: 6,002 × 2
`Date and Time` `Water Level, meters`
<dttm> <dbl>
1 2023-03-28 14:24:00 0.351
2 2023-03-28 14:30:00 0.415
3 2023-03-28 14:36:00 0.537
4 2023-03-28 14:42:00 0.439
5 2023-03-28 14:48:00 0.392
6 2023-03-28 14:54:00 0.321
7 2023-03-28 15:00:00 0.320
8 2023-03-28 15:06:00 0.320
9 2023-03-28 15:12:00 0.321
10 2023-03-28 15:18:00 0.321
# ℹ 5,992 more rows
Now add this to the other two datasets
merged_three_df <- merge_data(merged_df,test_summary_stevens, by = "Date and Time")
merged_three_df# A tibble: 6,017 × 4
`Date and Time` Event `Water Level, meters.x` `Water Level, meters.y`
<dttm> <dbl> <dbl> <dbl>
1 2023-03-28 12:54:00 0 NA NA
2 2023-03-28 13:00:00 0 NA NA
3 2023-03-28 13:06:00 0 NA NA
4 2023-03-28 13:12:00 0 NA NA
5 2023-03-28 13:18:00 0 NA NA
6 2023-03-28 13:24:00 0 NA NA
7 2023-03-28 13:30:00 0 NA NA
8 2023-03-28 13:36:00 0 NA NA
9 2023-03-28 13:42:00 0 NA NA
10 2023-03-28 13:48:00 0 NA NA
# ℹ 6,007 more rows
Manually change the names of the water level columns
You have to change this depending on the files that you have merged
colnames(merged_three_df)[3:4] <- c("ISCO Water Level, meters", "Stevens, Water Level, meters")Now do the plotting. This might require a bit of scaling to make sure this plots well
p3 <- plot_merged(merged_three_df)
p3Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 6032 rows containing missing values or values outside the scale range
(`geom_line()`).

ggplotly(p3)This indicates that the Stevens offset is wrong, possibly the tape has shifted.
Add a final file (HOBO20 upstream)
This is an S1 or S2 file
flume_hobo20 <- read_flume_output2(name = "S1042223_processed.csv",
path_to_file = paste0(Data_dir,"/Flumes/processed"))Date in ISO8601 format; converting timezone from UTC to "America/Argentina/Buenos_Aires".
Again, summarise first to 6min data:
test_summary_hobo20 <- sum_fun(flume_hobo20)
test_summary_hobo20# A tibble: 6,003 × 2
`Date and Time` `Water Level, meters`
<dttm> <dbl>
1 2023-03-28 14:30:00 NaN
2 2023-03-28 14:36:00 NaN
3 2023-03-28 14:42:00 NaN
4 2023-03-28 14:48:00 NaN
5 2023-03-28 14:54:00 NaN
6 2023-03-28 15:00:00 0.0237
7 2023-03-28 15:06:00 0.0237
8 2023-03-28 15:12:00 0.0233
9 2023-03-28 15:18:00 0.025
10 2023-03-28 15:24:00 0.0257
# ℹ 5,993 more rows
Add this to the rest of the data
merged_four_df <- merge_data(merged_three_df,test_summary_hobo20, by = "Date and Time")
merged_four_df# A tibble: 6,019 × 5
`Date and Time` Event `ISCO Water Level, meters` Stevens, Water Level, …¹
<dttm> <dbl> <dbl> <dbl>
1 2023-03-28 12:54:00 0 NA NA
2 2023-03-28 13:00:00 0 NA NA
3 2023-03-28 13:06:00 0 NA NA
4 2023-03-28 13:12:00 0 NA NA
5 2023-03-28 13:18:00 0 NA NA
6 2023-03-28 13:24:00 0 NA NA
7 2023-03-28 13:30:00 0 NA NA
8 2023-03-28 13:36:00 0 NA NA
9 2023-03-28 13:42:00 0 NA NA
10 2023-03-28 13:48:00 0 NA NA
# ℹ 6,009 more rows
# ℹ abbreviated name: ¹`Stevens, Water Level, meters`
# ℹ 1 more variable: `Water Level, meters` <dbl>
Manually change the name of the last column
merged_four_df <- merged_four_df %>%
rename("HOBO20 Water Level, meters" = "Water Level, meters")We can again plot this
p4 <- plot_merged(merged_four_df)
p4Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 6060 rows containing missing values or values outside the scale range
(`geom_line()`).

ggplotly(p4)# # testing ft
# merged_four_df_test <- merged_four_df %>%
# mutate(`ISCO Water Level, meters` = 0.3*`ISCO Water Level, meters`)
# p4a <- merged_four_df_test %>%
# pivot_longer(3:ncol(merged_four_df),values_to = "value",
# names_to="variable") %>%
# ggplot(aes(`Date and Time`, Event*0.2)) +
# geom_point(colour = "blue", alpha = 0.5) +
# #geom_bar(stat="identity", fill = "darkblue") +
# geom_line(aes(`Date and Time`,value, colour =variable)) +
# theme_bw() + ylab("Rainfall (mm)") +
# scale_y_continuous(sec.axis = sec_axis(trans=~., name = "water level (m)"))
# p4a
# ggplotly(p4a)
#
Write out the data
We can now write out the different data sets to files to be used later.
This will be stored in a folder called “combined”.
if (!file.exists("../All_Raw_Data/Test_T4T52023_folder/combined")) {
dir.create("../All_Raw_Data/Test_T4T52023_folder/combined")
}
split_dir <- strsplit(Data_dir,"/")
file_indicator <- split_dir[[1]][length(split_dir[[1]])]
write_csv(merged_four_df, paste0("../All_Raw_Data/Test_T4T52023_folder/combined/Rain_flumes_combinedV1",
file_indicator,".csv"))You can store the other data in the same way if needed.